Generation of F0 contours using a model-constrained data-driven method
نویسندگان
چکیده
This paper introduces a novel model-constrained, data-driven method for generating fundamental frequency contours in Japanese text-to-speech synthesis. In the training phase, the parameters of a command-response F0 contour generation model are learned by a prediction module, which can be a neural network or a set of binary regression trees. The input features consist of linguistic information related to accentual phrases that can be automatically derived from text, such as the position of the accentual phrase in the utterance, number of morae, accent type, and parts-of-speech. In the synthesis phase, the prediction module is used to generate appropriate values of model parameters. The use of the parametric model restricts the degrees of freedom of the problem, facilitating data-driven learning. Experimental results show that the method makes it possible to generate quite natural F0 contours with a relatively small training database.
منابع مشابه
Corpus-based synthesis of fundamental frequency contours based on a generation process model
A mode-constrained corpus-based synthesis strategy was developed for fundamental frequency (F0) contours of Japanese sentences. In the training phase, the relationship between linguistic factors and the command values (amplitudes and locations) of F0 contour generation process model was learned for a prediction module; a neural network in the current paper. Input parameters consist of linguisti...
متن کاملData-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model
A data-driven method of fundamental frequency (F0) contour synthesis was developed for Japanese text-to-speech (TTS) conversion systems. In the method, synthesis is done using the F0 contour generation process model, and the model parameters for each accent phrase are estimated using statistical methods. Although it was already shown that the synthesized F0 contours sounded highly natural as th...
متن کاملGeneration of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model
A new method for generating sentence F0 contours of Mandarin speech is proposed. The method assumes the F0 contour generation process model, but generates the tone and phrase components in different ways and sums them to produce a sentence F0 contour. The tone component is generated concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme (binary decision trees). E...
متن کاملImproved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observ...
متن کاملSynthesis and evaluation of intonation with a superposition model
A data-driven method based on a new paradigm is introduced in this paper. We assume that cognitive representations of the discourse are prosodically encoded by means of global multiparametric prototypes. The generation of adequate prosodic contours is then obtained by retrieving and combining these elementary prototypic contours accessed by linguistic or paralinguistic keys. We examine here F0 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001